7 research outputs found
New Algorithms for Predicting Conformational Polymorphism and Inferring Direct Couplings for Side Chains of Proteins
Protein crystals populate diverse conformational ensembles. Despite much evidence that
there is widespread conformational polymorphism in protein side chains, most of the xray
crystallography data are modelled by single conformations in the Protein Data Bank.
The ability to extract or to predict these conformational polymorphisms is of crucial importance,
as it facilitates deeper understanding of protein dynamics and functionality.
This dissertation describes a computational strategy capable of predicting side-chain polymorphisms.
The applied approach extends a particular class of algorithms for side-chain
prediction by modelling the side-chain dihedral angles more appropriately as continuous
rather than discrete variables. Employing a new inferential technique known as particle
belief propagation (PBP), we predict residue-speci c distributions that encode information
about side-chain polymorphisms. The predicted polymorphisms are in relatively close
agreement with results from a state-of-the-art approach based on x-ray crystallography
data. This approach characterizes the conformational polymorphisms of side chains using
electron density information, and has successfully discovered previously unmodelled
conformations.
Furthermore, it is known that coupled
uctuations and concerted motions of residues
can reveal pathways of communication used for information propagation in a molecule
and hence, can help in understanding the \allostery" phenomenon in proteins. In order
to characterize the coupled motions, most existing methods infer structural dependencies
among a protein's residues. However, recent studies have highlighted the role of coupled
side-chain
uctuations alone in the allosteric behaviour of proteins, in contrast to a
common belief that the backbone motions play the main role in allostery. These studies
and the aforementioned recent discoveries about prevalent alternate side-chain conformations
(conformational polymorphism) accentuate the need to devise new computational
approaches that acknowledge side chains' roles. As well, these approaches must consider
the polymorphic nature of the side chains, and incorporate e ects of this phenomenon
(polymorphism) in the study of information transmission and functional interactions of
residues in a molecule. Such frameworks can provide a more accurate understanding of the
allosteric behaviour.
Hence, as a topic related to the conformational polymorphism, this dissertation addresses
the problem of inferring directly coupled side chains, as well. First, we present a
novel approach to generate an ensemble of conformations and an e cient computational
method to extract direct couplings of side chains in allosteric proteins. These direct couplings
are used to provide sparse network representations of the coupled side chains. The
framework is based on a fairly new statistical method, named graphical lasso (GLASSO),
iii
devised for sparse graph estimation. In the proposed GLASSO-based framework, the sidechain
conformational polymorphism is taken into account. It is shown that by studying
the intrinsic dynamics of an inactive structure alone, we are able to construct a network of
functionally crucial residues. Second, we show that the proposed method is capable of providing
a magni ed view of the coupled and conformationally polymorphic side chains. This
model reveals couplings between the alternate conformations of a coupled residue pair. To
the best of our knowledge, this is the rst computational method for extracting networks
of side chains' alternate conformations. Such networks help in providing a detailed image
of side-chain dynamics in functionally important and conformationally polymorphic sites,
such as binding and/or allosteric sites. This information may assist in new drug-design
alternatives.
Side-chain conformations are commonly represented by multivariate angular variables.
However, the GLASSO and other existing methods that can be applied to the aforementioned
inference task are not capable of handling multivariate angular data. This dissertation
further proposes a novel method to infer direct couplings from this type of data, and
shows that this method is useful for identifying functional regions and their interactions in
allosteric proteins. The proposed framework is a novel extension of canonical correlation
analysis (CCA), which we call \kernelized partial CCA" (or simply KPCCA). Using the
conformational information and
uctuations of the inactive structure alone for allosteric
proteins in the Ras and other Ras-like families, the KPCCA method identi ed allosterically
important residues not only as strongly coupled ones but also in densely connected
regions of the interaction graph formed by the inferred couplings. The results were in good
agreement with other empirical ndings and outperformed those obtained by the GLASSO-based framework. By studying distinct members of the Ras, Rho, and Rab sub-families,
we show further that KPCCA is capable of inferring common allosteric characteristics in
the small G protein super-family
Bayesian Optimization Algorithm for Non-unique Oligonucleotide Probe Selection
One important application of DNA microarrays is measuring the expression levels of genes. The quality of the microarrays design which includes selecting short Oligonucleotide sequences (probes) to be affixed on the surface of the microarray becomes a major issue. A good design is the one that contains the minimum possible number of probes while having an acceptable ability in identifying the targets existing in the sample. We focuse on the problem of computing the minimal set of probes which is able to identify each target of a sample, referred to as Non-unique Oligonucleotide Probe Selection. We present the application of an Estimation of Distribution Algorithm named Bayesian Optimization Algorithm (BOA) to this problem, and consider integration of BOA and one simple heuristic. We also present application of our method in integration with decoding approach in a multiobjective optimization framework for solving the problem in case of multiple targets in the sample
A Comparative Study of Cluster Detection Algorithms in Protein–Protein Interaction for Drug Target Discovery and Drug Repurposing
The interactions between drugs and their target proteins induce altered expression of genes involved in complex intracellular networks. The properties of these functional network modules are critical for the identification of drug targets, for drug repurposing, and for understanding the underlying mode of action of the drug. The topological modules generated by a computational approach are defined as functional clusters. However, the functions inferred for these topological modules extracted from a large-scale molecular interaction network, such as a protein–protein interaction (PPI) network, could differ depending on different cluster detection algorithms. Moreover, the dynamic gene expression profiles among tissues or cell types causes differential functional interaction patterns between the molecular components. Thus, the connections in the PPI network should be modified by the transcriptomic landscape of specific cell lines before producing topological clusters. Here, we systematically investigated the clusters of a cell-based PPI network by using four cluster detection algorithms. We subsequently compared the performance of these algorithms for target gene prediction, which integrates gene perturbation data with the cell-based PPI network using two drug target prioritization methods, shortest path and diffusion correlation. In addition, we validated the proportion of perturbed genes in clusters by finding candidate anti-breast cancer drugs and confirming our predictions using literature evidence and cases in the ClinicalTrials.gov. Our results indicate that the Walktrap (CW) clustering algorithm achieved the best performance overall in our comparative study